23 research outputs found

    Algorithms and Architectures for Network Search Processors

    Get PDF
    The continuous growth in the Internetā€™s size, the amount of data traļ¬ƒc, and the complexity of processing this traļ¬ƒc gives rise to new challenges in building high-performance network devices. One of the most fundamental tasks performed by these devices is searching the network data for predeļ¬ned keys. Address lookup, packet classiļ¬cation, and deep packet inspection are some of the operations which involve table lookups and searching. These operations are typically part of the packet forwarding mechanism, and can create a performance bottleneck. Therefore, fast and resource eļ¬ƒcient algorithms are required. One of the most commonly used techniques for such searching operations is the Ternary Content Addressable Memory (TCAM). While TCAM can oļ¬€er very fast search speeds, it is costly and consumes a large amount of power. Hence, designing cost-eļ¬€ective, power-eļ¬ƒcient, and high-speed search techniques has received a great deal of attention in the research and industrial community. In this thesis, we propose a generic search technique based on Bloom ļ¬lters. A Bloom ļ¬lter is a randomized data structure used to represent a set of bit-strings compactly and support set membership queries. We demonstrate techniques to convert the search process into table lookups. The resulting table data structures are kept in the oļ¬€-chip memory and their Bloom ļ¬lter representations are kept in the on-chip memory. An item needs to be looked up in the oļ¬€-chip table only when it is found in the on-chip Bloom ļ¬lters. By ļ¬ltering the oļ¬€-chip memory accesses in this fashion, the search operations can be signiļ¬cantly accelerated. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of cost-eļ¬€ectiveness, speed, and power-eļ¬ƒciency

    Synthesizable Design of a Multi-Module Memory Controller

    Get PDF
    Random Access Memory (RAM) is a common resources needed by networking hardware modules. Synchronous Dynamic RAM (SDRAM) provides a cost effective solution for such data storage. As the packet processing speeds in the hardware increase memory throughput can be a bottleneck to achieve overall high performance. Typically there are multiple hardware modules which perform different operations on the packet payload and hence all try to access the common packet buffer simultaneously. This gives rise to a need for a memory controller which arbitrates between the memory requests made by different modules and maximizes the memory throughput. This paper discusses the design and implementation of a SDRAM controller which satisfies both the requirements. The memory throughput depends on the burst lengths, the address pattern of the memory accesses and the type of memory access (read/write). Given the information about the current SDRAM access and the pending SDRAM access requests, the controller finds the memory access request among the pending requests which utilizes the data bus most efficiently and increases the throughput. This leads to the re-ordering of the memory requests between modules. Results show how this controller improves the overall throughput

    Fast Packet Classification Using Bloom Filters

    Get PDF
    While the problem of general packet classification has received a great deal of attention from researchers over the last ten years, there is still no really satisfactory solution. Ternary Content Addressable Memory (TCAM), although widely used in practice, is both expensive and consumes a lot of power. Algorithmic solutions, which rely on commodity memory chips, are relatively inexpensive and power-efficient, but have not been able to match the generality and performance of TCAMs. In this paper we propose a new approach to packet classification, which combines architectural and algorithmic techniques. Our starting point is the well-known crossproducting algorithm, which is fast but has significant memory overhead due to the extra rules needed to represent the crossproducts. We show how to modify the crossproduct method in a way that drastically reduces the memory required, without compromising on performance. We avoid unnecessary accesses to off-chip memory by filtering off-chip accesses using on-chip Bloom filters. For packets that match p rules in a rule set, our algorithm requires just 4 + p + Ē« independent memory accesses on average, to return all matching rules, where Ē« Ć” 1 is a small constant that depends on the false positive rate of the Bloom filters. Each memory access is just 256 bits, making it practical to classify small packets at OC-192 link rates using two commodity SRAM chips. For rule set sizes ranging from a few hundred to several thousand filters, the average rule set expansion factor attributable to the algorithm is just 1.2. The memory consumption per rule is 36 bytes in the average case

    Generalized RAD Module Interface Specification of the Field-programmable Port eXtender (FPX) Version 2.0

    Get PDF
    The Field-programmable Port eXtender (FPX) provides dynamic, fast, and flexible mechanisms to process data streams at the ports of the Washington University Gigabit Switch (WUGS-20). By performing all computations in FPGA hardware, cells and packets can be processed at the full line speed of the transmission interface, currently 2.4 Gbits/sec. In order to design and implement portable hardware modules for the Reprogrammable Application Devide (RAD) on the FPX board, all modules should conform to a standard interface. This standard interface specifies how modules receive and transmit ATM cells of data flows, prevent data loss during reconfiguration, and access off-chip memory. Module designers should conform to the standard I/O signal names and take special note of timing diagram references

    System-on-Chip Packet Processor for an Experimental Network Services Platform

    Get PDF
    As the focus of networking research shifts from raw performance to the delivery of advanced network services, there is a growing need for open-platform systems for extensible networking research. The Applied Research Laboratory at Washington University in Saint Louis has developed a ļ¬‚exible Network Services Platform (NSP) to meet this need. The NSP provides an extensible platform for prototyping next-generation network services and applications. This paper describes the design of a system-on-chip Packet Processor for the NSP which performs all core packet processing functions including segmentation and reassembly, packet classiļ¬cation, route lookup, and queue management. Targeted to a commercial conļ¬gurable logic device, the system is designed to support gigabit links and switch fabrics with a 2:1 speed advantage. We provide resource consumption results for each component of the Packet Processor design

    Fast and scalable pattern matching for content filtering

    No full text
    High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect pre-defined keywords or signatures in the packets. Multi-pattern match-ing is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been de-veloped for such applications, we find that most of them suffer from scalability issues. To support a large number of patterns, the throughput is compromised or vice versa. We present a hardware-implementable pattern matching algo-rithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded mem-ory and a few megabytes of external SRAM. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snortā€™s string set

    Fast and scalable pattern matching for network intrusion detection systems

    No full text
    Abstract ā€” High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are required for line-speed packet processing. We present hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. Our algorithm is based on a memory efficient multi-hashing data structure called Bloom filter. We use embedded on-chip memory blocks in FPGA/VLSI chips to construct Bloom filters which can suppress a large fraction of memory accesses and speed up string matching. Based on this concept, we first present a simple algorithm which can scan for several thousand short (up to 16 bytes) patterns at multi-gigabit per second speeds with a moderately small amount of embedded memory and a few mega bytes of external memory. Furthermore, we modify this algorithm to be able to handle arbitrarily large strings at the cost of a little more on-chip memory. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snortā€™s string set

    Design and implementation of a string matching system for network intrusion detection using FPGA-based bloom filters

    Get PDF
    Modern Network Intrusion Detection Systems (NIDS) inspect the network packet payload to check if it conforms to the security policies of the given network. This process, often referred to as deep packet inspection, involves detection of predefined signature strings or keywords starting at an arbitrary location in the payload. String matching is a computationally intensive task and can become a potential bottleneck without high-speed processing. Since the conventional software-implemented string matching algorithms have not kept pace with the increasing network speeds, special purpose hardware solutions have been introduced. In this paper we show how Bloom filters can be used effectively to perform string matching for thousands of strings at wire speed. We describe how Bloom filters can be implemented feasibly with FPGA. Technology analysis shows that this approach for string matching is more effective than the current FPGA-based string matching circuits which use Deterministic or Non-deterministic Finite Automata (DFA or NFA). Finally, we give the details of our implementation of string matching technique on Xilinx XCV 2000E FPGA.
    corecore